Bioinformatics (Thomas Dandekar, Meik Kunz)

In any case, it is advisable to always take a look at the quality parameters of the data

bases first in order to be able to assess the actual information content and the usability of

the information provided for one’s own scientific work and resulting statements.

6.2

Users should therefore always take a look at the quality parameters before using the

information provided.

So with that, we understand how bioinformatics now works so quickly and soundly.

There are fast and yet surprisingly accurate programs used (heuristics). And there are

good, highly sophisticated databases where you can trust the entries and yet they are very

well maintained.

Therefore, a few other notable heuristics should be mentioned here. Besides BLAST

sequence search, BLAT search is another speedup, as is Mega-BLAST (the expert then

knows what is more easily overlooked by these variants of BLAST).

Even 3-D structures are made faster and shorter by heuristic searches. In particular,

many reasonably fast modeling programs use the homology modeling step, that is, using

known structures to model the unknown structure if it is sufficiently similar. This heuristic

is not an exact model and assumes that the new structure is too similar to something. The

heuristic is even more stringent in threading. Here it is assumed that even an unknown 3-D

structure can be predicted by combining and testing known 3-D structures. To do this, the

unknown structure is threaded onto the known 3-D structures on the basis of the sequence.

One then calculates which region is best covered by which known structure. Not exact, just

a heuristic.

One can be surprised at the protein interaction database STRING (EMBL) how quickly

the interactions are calculated. A trick is used that is also used by a number of other data

bases. Here, all interactions are calculated in many weeks with each update of the data

base. The single database query now only looks up where the best entry for the query is

located in the database. If one or more sequences are entered, this is done via a sequence

comparison (with BLAST), if a keyword is entered, this is done via a fast text search.

Metabolic models often make the heuristic assumption of steady-state equilibrium and

then calculate the underlying enzyme chains for this equilibrium (flux balance analysis;

the same principle used: elementary mode analyses). Even if, for example, YANAsquare

calculates flux strengths, it makes the simplified assumption that gene expression data

6.2 Maintenance of Databases and Acceleration of Programs